Towards Automatic Construction of Diverse, High-quality Image Dataset

نویسندگان

  • Yazhou Yao
  • Jian Zhang
  • Fumin Shen
  • Dongxiang Zhang
  • Zhenmin Tang
  • Heng Tao Shen
چکیده

The availability of labeled image datasets has been shown critical for high-level image understanding, which continuously drives the progress of feature designing and models developing. However, constructing labeled image datasets is laborious and monotonous. To eliminate manual annotation, in this work, we propose a novel image dataset construction framework by employing multiple textual metadata. We aim at collecting diverse and accurate images for given queries from the Web. Specifically, we formulate noisy textual metadata removing and noisy images filtering as a multi-view and multi-instance learning problem separately. Our proposed approach not only improves the accuracy, but also enhances the diversity of the selected images. To verify the effectiveness of our proposed approach, we construct an image dataset with 100 categories. The experiments show significant performance gains by using the generated data of our approach on several tasks, such as image classification, cross-dataset generalization and object detection. The proposed method also consistently outperforms existing weakly supervised and web-supervised approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A hierarchical Convolutional Neural Network for Segmentation of Stroke Lesion in 3D Brain MRI

Introduction: Brain tumors such as glioma are among the most aggressive lesions, which result in a very short life expectancy in patients. Image segmentation is highly essential in medical image analysis with applications, particularly in clinical practices to treat brain tumors. Accurate segmentation of magnetic resonance data is crucial for diagnostic purposes, planning surgical treatments, a...

متن کامل

A hierarchical Convolutional Neural Network for Segmentation of Stroke Lesion in 3D Brain MRI

Introduction: Brain tumors such as glioma are among the most aggressive lesions, which result in a very short life expectancy in patients. Image segmentation is highly essential in medical image analysis with applications, particularly in clinical practices to treat brain tumors. Accurate segmentation of magnetic resonance data is crucial for diagnostic purposes, planning surgical treatments, a...

متن کامل

Object-Oriented Method for Automatic Extraction of Road from High Resolution Satellite Images

As the information carried in a high spatial resolution image is not represented by single pixels but by meaningful image objects, which include the association of multiple pixels and their mutual relations, the object based method has become one of the most commonly used strategies for the processing of high resolution imagery. This processing comprises two fundamental and critical steps towar...

متن کامل

Well Begun Is Half Done: Generating High-Quality Seeds for Automatic Image Dataset Construction from Web

We present a fully automatic approach to construct a large-scale, highprecision dataset from noisy web images. Within the entire pipeline, we focus on generating high quality seed images for subsequent dataset growing. High quality seeds are essential as we revealed, but they have received relatively less attention in previous works with respect to how to automatically generate them. In this wo...

متن کامل

Tags Re-ranking Using Multi-level Features in Automatic Image Annotation

Automatic image annotation is a process in which computer systems automatically assign the textual tags related with visual content to a query image. In most cases, inappropriate tags generated by the users as well as the images without any tags among the challenges available in this field have a negative effect on the query's result. In this paper, a new method is presented for automatic image...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1708.06495  شماره 

صفحات  -

تاریخ انتشار 2017